Goto

Collaborating Authors

 Newport County


Why Millennials Love Prenups

The New Yorker

Long the province of the ultra-wealthy, prenuptial agreements are being embraced by young people--including many who don't have all that much to divvy up. More than forty per cent of millennials and Gen Z-ers claim to have signed a prenup. Andrea Zevallos declared 2016 her "year of dating." She was twenty-seven, working at Universal Studios Hollywood, the theme park, and determined to find love. She calculated it would take three dates a week. By December, she was losing hope. "It was exhausting," she said. Then, while scrolling OkCupid, she noticed a "cute guy" with a "Hamilton" reference in his handle. His name was Alex Switzky, and like her he was a musical-theatre enthusiast and aspiring screenwriter. He was different from the other men she'd met. On their second date, he started planning a third. Zevallos "was used to L.A. guys cagey about any sort of calendar." One day, Switzky called her. Accustomed to texts, she assumed that he was about to break up with her. "The most millennial response," she recalled, laughing.


Becoming a Centenarian

The New Yorker

Like The New Yorker, I was born in 1925. Somewhat to my surprise, I decided to keep a journal of my hundredth year. The author, who was born on December 17, 1925, notes that the magazine's first issue came out ten months before he did. Old age is no joke, but it can feel like one. You look everywhere for your glasses, until your wife points out that you're wearing them. I turn a hundred this year. People act as though this is an achievement, and I suppose it is, sort of. Nobody in my family has lived this long, and I've been lucky. I'm still in pretty good health, no wasting diseases or Alzheimer's, and friends and strangers comment on how young I look, which cues me to cite the three ages of man: Youth, Maturity, and You Look Great. On the other hand, I've lost so many useful abilities that my wife, Dodie, and I have taken to calling me Feebleman. Look, up in the sky! No, it's Dodie doesn't want me to know how old she is, but she's nearly three decades younger than I am, and I become ...


CLaRa: Bridging Retrieval and Generation with Continuous Latent Reasoning

He, Jie, Bai, Richard He, Williamson, Sinead, Pan, Jeff Z., Jaitly, Navdeep, Zhang, Yizhe

arXiv.org Artificial Intelligence

Retrieval-augmented generation (RAG) enhances large language models (LLMs) with external knowledge but still suffers from long contexts and disjoint retrieval-generation optimization. In this work, we propose CLaRa (Continuous Latent Reasoning), a unified framework that performs embedding-based compression and joint optimization in a shared continuous space. To obtain semantically rich and retrievable compressed vectors, we introduce SCP, a key-preserving data synthesis framework using QA and paraphrase supervision. CLaRa then trains the reranker and generator end-to-end via a single language modeling loss, with gradients flowing through both modules using a differentiable top-k estimator. Theoretically, this unified optimization aligns retrieval relevance with answer quality. Experiments across multiple QA benchmarks show that CLaRa achieves state-of-the-art compression and reranking performance, often surpassing text-based fine-tuned baselines.


Yanyun-3: Enabling Cross-Platform Strategy Game Operation with Vision-Language Models

Wang, Guoyan, Huang, Yanyan, Chen, Chunlin, Wang, Lifeng, Sun, Yuxiang

arXiv.org Artificial Intelligence

Cross-platform strategy game automation remains a challenge due to diverse user interfaces and dynamic battlefield environments. Existing Vision--Language Models (VLMs) struggle with generalization across heterogeneous platforms and lack precision in interface understanding and action execution. We introduce Yanyun-3, a VLM-based agent that integrates Qwen2.5-VL for visual reasoning and UI-TARS for interface execution. We propose a novel data organization principle -- combination granularity -- to distinguish intra-sample fusion and inter-sample mixing of multimodal data (static images, multi-image sequences, and videos). The model is fine-tuned using QLoRA on a curated dataset across three strategy game platforms. The optimal strategy (M*V+S) achieves a 12.98x improvement in BLEU-4 score and a 63% reduction in inference time compared to full fusion. Yanyun-3 successfully executes core tasks (e.g., target selection, resource allocation) across platforms without platform-specific tuning. Our findings demonstrate that structured multimodal data organization significantly enhances VLM performance in embodied tasks. Yanyun-3 offers a generalizable framework for GUI automation, with broader implications for robotics and autonomous systems.


Shall We Play a Game? Language Models for Open-ended Wargames

Matlin, Glenn, Mahajan, Parv, Song, Isaac, Hao, Yixiong, Bard, Ryan, Topp, Stu, Montoya, Evan, Parwani, M. Rehan, Shetty, Soham, Riedl, Mark

arXiv.org Artificial Intelligence

Wargames are simulations of conflicts in which participants' decisions influence future events. While casual wargaming can be used for entertainment or socialization, serious wargaming is used by experts to explore strategic implications of decision-making and experiential learning. In this paper, we take the position that Artificial Intelligence (AI) systems, such as Language Models (LMs), are rapidly approaching human-expert capability for strategic planning -- and will one day surpass it. Military organizations have begun using LMs to provide insights into the consequences of real-world decisions during _open-ended wargames_ which use natural language to convey actions and outcomes. We argue the ability for AI systems to influence large-scale decisions motivates additional research into the safety, interpretability, and explainability of AI in open-ended wargames. To demonstrate, we conduct a scoping literature review with a curated selection of 100 unclassified studies on AI in wargames, and construct a novel ontology of open-endedness using the creativity afforded to players, adjudicators, and the novelty provided to observers. Drawing from this body of work, we distill a set of practical recommendations and critical safety considerations for deploying AI in open-ended wargames across common domains. We conclude by presenting the community with a set of high-impact open research challenges for future work.


Evaluation Framework for Highlight Explanations of Context Utilisation in Language Models

Sun, Jingyi, Atanasova, Pepa, Choudhury, Sagnik Ray, Islam, Sekh Mainul, Augenstein, Isabelle

arXiv.org Artificial Intelligence

Context utilisation, the ability of Language Models (LMs) to incorporate relevant information from the provided context when generating responses, remains largely opaque to users, who cannot determine whether models draw from parametric memory or provided context, nor identify which specific context pieces inform the response. Highlight explanations (HEs) offer a natural solution as they can point the exact context pieces and tokens that influenced model outputs. However, no existing work evaluates their effectiveness in accurately explaining context utilisation. We address this gap by introducing the first gold standard HE evaluation framework for context attribution, using controlled test cases with known ground-truth context usage, which avoids the limitations of existing indirect proxy evaluations. To demonstrate the framework's broad applicability, we evaluate four HE methods -- three established techniques and MechLight, a mechanistic interpretability approach we adapt for this task -- across four context scenarios, four datasets, and five LMs. Overall, we find that MechLight performs best across all context scenarios. However, all methods struggle with longer contexts and exhibit positional biases, pointing to fundamental challenges in explanation accuracy that require new approaches to deliver reliable context utilisation explanations at scale.


Just-In-Time Objectives: A General Approach for Specialized AI Interactions

Lam, Michelle S., Shaikh, Omar, Xu, Hallie, Guo, Alice, Yang, Diyi, Heer, Jeffrey, Landay, James A., Bernstein, Michael S.

arXiv.org Artificial Intelligence

Large language models promise a broad set of functions, but when not given a specific objective, they default to milquetoast results such as drafting emails littered with cliches. We demonstrate that inferring the user's in-the-moment objective, then rapidly optimizing for that singular objective, enables LLMs to produce tools, interfaces, and responses that are more responsive and desired. We contribute an architecture for automatically inducing just-in-time objectives by passively observing user behavior, then steering downstream AI systems through generation and evaluation against this objective. Inducing just-in-time objectives (e.g., "Clarify the abstract's research contribution") enables automatic generation of tools, e.g., those that critique a draft based on relevant HCI methodologies, anticipate related researchers' reactions, or surface ambiguous terminology. In a series of experiments (N=14, N=205) on participants' own tasks, JIT objectives enable LLM outputs that achieve 66-86% win rates over typical LLMs, and in-person use sessions (N=17) confirm that JIT objectives produce specialized tools unique to each participant.


ALLOY: Generating Reusable Agent Workflows from User Demonstration

Li, Jiawen, Ning, Zheng, Tian, Yuan, Li, Toby Jia-jun

arXiv.org Artificial Intelligence

Large language models (LLMs) enable end-users to delegate complex tasks to autonomous agents through natural language. However, prompt-based interaction faces critical limitations: Users often struggle to specify procedural requirements for tasks, especially those that don't have a factually correct solution but instead rely on personal preferences, such as posting social media content or planning a trip. Additionally, a ''successful'' prompt for one task may not be reusable or generalizable across similar tasks. We present ALLOY, a system inspired by classical HCI theories on Programming by Demonstration (PBD), but extended to enhance adaptability in creating LLM-based web agents. ALLOY enables users to express procedural preferences through natural demonstrations rather than prompts, while making these procedures transparent and editable through visualized workflows that can be generalized across task variations. In a study with 12 participants, ALLOY's demonstration--based approach outperformed prompt-based agents and manual workflows in capturing user intent and procedural preferences in complex web tasks. Insights from the study also show how demonstration--based interaction complements the traditional prompt-based approach.


IDfRA: Self-Verification for Iterative Design in Robotic Assembly

Khendry, Nishka, Margadji, Christos, Pattinson, Sebastian W.

arXiv.org Artificial Intelligence

As robots proliferate in manufacturing, Design for Robotic Assembly (DfRA), which is designing products for efficient automated assembly, is increasingly important. Traditional approaches to DfRA rely on manual planning, which is time-consuming, expensive and potentially impractical for complex objects. Large language models (LLM) have exhibited proficiency in semantic interpretation and robotic task planning, stimulating interest in their application to the automation of DfRA. But existing methodologies typically rely on heuristic strategies and rigid, hard-coded physics simulators that may not translate into real-world assembly contexts. In this work, we present Iterative Design for Robotic Assembly (IDfRA), a framework using iterative cycles of planning, execution, verification, and re-planning, each informed by self-assessment, to progressively enhance design quality within a fixed yet initially under-specified environment, thereby eliminating the physics simulation with the real world itself. The framework accepts as input a target structure together with a partial environmental representation. Through successive refinement, it converges toward solutions that reconcile semantic fidelity with physical feasibility. Empirical evaluation demonstrates that IDfRA attains 73.3\% top-1 accuracy in semantic recognisability, surpassing the baseline on this metric. Moreover, the resulting assembly plans exhibit robust physical feasibility, achieving an overall 86.9\% construction success rate, with design quality improving across iterations, albeit not always monotonically. Pairwise human evaluation further corroborates the advantages of IDfRA relative to alternative approaches. By integrating self-verification with context-aware adaptation, the framework evidences strong potential for deployment in unstructured manufacturing scenarios.


EclipseTouch: Touch Segmentation on Ad Hoc Surfaces using Worn Infrared Shadow Casting

Mollyn, Vimal, DeVrio, Nathan, Harrison, Chris

arXiv.org Artificial Intelligence

The ability to detect touch events on uninstrumented, everyday surfaces has been a long-standing goal for mixed reality systems. Prior work has shown that virtual interfaces bound to physical surfaces offer performance and ergonomic benefits over tapping at interfaces floating in the air. A wide variety of approaches have been previously developed, to which we contribute a new headset-integrated technique called \systemname. We use a combination of a computer-triggered camera and one or more infrared emitters to create structured shadows, from which we can accurately estimate hover distance (mean error of 6.9~mm) and touch contact (98.0\% accuracy). We discuss how our technique works across a range of conditions, including surface material, interaction orientation, and environmental lighting.